When most people think of random sampling, they think of Simple Random Sampling. If we have 1000 observations in our sample and we only want 100, simple random sampling will pick 100 from the 1000 with equal probability.
The main benefit of stratified sampling is that it increases the likelihood that our sample remains an accurate sample of the population. This property is especially valuable when collecting additional observations is expensive.
When specifying subpopulations, the following conditions apply:
With some thought, we can split the population to take account of both age and arthritis status while satisfying both the conditions above:
Once we have specified our strata, we then need to determine the appropriate size of each stratum. This requires either knowing (or reasonably well estimating) the ratios of each stratum. As this will usually require parsing the entire dataset,
In [ ]: